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1 PerfExploren A Performance Data Mintno Framework For Larae-Scate Parallel B 
Computing 

Kevin A. Huck, Allen D. Malony 

November 2005 Proceedings of the 2005 ACM/IEEE conference on Supercomputing SC 
'05 

Publisher IEEE Computer Society 

Full text available; gt)dff2.26MB) Additional Information: full citation , abstract , index terms 

Parallel applications running on high-end computer systems manifest a complexity of 
performance phenomena. Tools to observe parallel performance attempt to capture these 
phenomena in measurement datasets rich with information relating multiple performance 
metrics to execution dynamics and parameters specific to the application -system 
experiment. However, the potential size of datasets and the need to assimilate results 
from multiple experiments makes it a daunting challenge to not only process t ... 

2 DB-3 (databases^ data mining: Framework and algorithms for trend analysis In g 
^ massive temporal data sets 

^ Sreenivas Gollapudi, D. Sivakumar 

November 2004 Proceedings of the thirteenth ACM international conference on 

Information and icnowledge management CIKi4 '04 
Publisher ACM Press 

Full text available: ffi Ddff235.7Q KB) Additional Infonmation: full citation, abstract, references , dtinos. index 

terms 

Mining massive temporal data streams for significant trends, emerging buzz, and 
unusually high or low activity is an Important problem with several commercial 
applications. In this paper, we propose a framework based on relational records and 
metric spaces to study such problems. Our framework provides the necessary 
mathematical underpinnings for this genre of problems, and leads to efficient algorithms 
in the stream/sort model of nnassive data sets (where the algorithm makes passes over 
the d ... 

iCeywords: data stream algorithms, hierarchically partitioned data, metric 
approximations, taxonomies, trend analysis 
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^ Hui Yang, Srlnlvasan Parthasarathy, Sameep Mehta 

^ August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available: ffipdffl.OSMB) Additional Information: full citation, abstract , references , citings . Index 

terms 

In this paper, we present a general framework to discover spatial associations and spatio- 
temporal episodes for scientific datasets. In contrast to previous work in this area, 
features are modeled as geometric objects rather than points. We define multiple distance 
metrics that take Into account objects' extent and thus are more robust in capturing the 
influence of an object on other objects in spatial neighborhood. We have developed 
algorithms to discover four different types of spatial object ... 

Keywords: scientific data, spatial object association, spatio-temporal association/episode 



* Technical o pinion: Component-based data mining framewortcs 
^ Femando Berzal, Ignacio Blanco, Juan-Carlos Cubero, Nicolas Marin 
~ December 2002 Communicatfons of the ACM, voiuma 45 issue iz 
Publisher ACM Press 

Full text available: gpdf(110.82 KB) AddHional Information: full citation, abstract, references, dtjnos. index 
fflM(18.89KB) terms 

OUP Vs. OLTP In the middle tier. 

^ Declarative data mining: A framewori^ for data minino and KDD 
^ Ingolf Geist 

^ March 2002 Proceedings of the 2002 ACM symposium on Applied computing SAC '02 
Publisher ACM Press 

Full text available: ffi pdff552.51 KB) Additional Information: full citation, abstract, references, cttinos. index 

terms 

The KDD process Is a non-trivial, Iterath^e, interactive and multi-step process, that 
requires the development of a unlf/Ing model. This model have to ensure an uniform 
description of data and patterns and the control of the manipulation of the data and 
patterns. Thus, the model defines operations within the pattern and data, as well as 
transition operations between data and patterns.Thls paper proposes a framework 
consisting of a model view, a data view and a process view. It focuses on the mod ... 

Keywords: constraint databases, knowledge discovery in databases, mining model 



Research sessions: data mining ap plications: Cost-ljased labeling of groups of mass B 
spectra 

Lei Chen, Zheng Huang, Raghu Ramakrishnan 

June 2004 Proceedings of the 2004 ACM SIGMOD intemationai conference on 
Management of data SIGMOD '04 

Publisher ACM Press 

Full text available: gpdff351.21 KB) Additional Information; full citation, abstract , references 

We make two main contributions In this paper. First, we motivate and introduce a novel 
class of data mining problems that arise in labeling a group of mass spectra, specifically 
for analysis of atmospheric aerosols, but with natural applications to market-basket 
datasets. This builds upon other recent work in which we introduced the problem of 
labeling a single spectrum, and is motivated by the advent of a new generation of Aerosol 
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Time-of-Flight Spectrometers, which are capable of generating ma ... 

7 Research track posters: A microeconomic data mining problem: customer-oriented 

^ catalog segmentation 

^ Martin Ester, Rong Ge, Wen Jin, Zengjian Hu 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available: ^ p<ffl196.37 KB) Additional Information: full citation , abstract , references , index terms 

The microeconomic framework for data mining [7] assumes that an enterprise chooses a 
decision maximizing the overall utility over all customers where the contribution of a 
customer Is a function of the data available on that customer. In Catalog Segmentation, 
the enterprise wants to design k product catalogs of size r that maximize the overall 
number of catalog products purchased. However, there are many applications where a 
customer, once attracted to an enterprise, would purchase more products ... 

Keywords: catalog segmentation, clustering, microeconomic data mining 



• Data mining (DM): Expandino the taxonomies of bibliographic arciiives with 
^ persistent long-term themes 
^ Rene Schult, Myra Splliopoulou 

April 2006 Proceedings of the 2006 ACM symposium on Applied computing SAC '06 
Publisher ACM Press 

Full text available: ^ pdff210.33 KB) Additional Information: full dtation. abstract , references , index ternis 

As document collections accummulate over time, some of the discussion subjects In them 
become outfashloned, while new ones emerge. In this paper, we address the challenge of 
finding such emerging and pers/ste/Jt "themes", i.e. subjects that live long enough to be 
incorporated into a taxonomy or ontology describing the document collection. Our method 
is based on similarity-based clustering and cluster label construction and fbcusses on the 
Identification of cluster labels that "survive" cha ... 

Keywords: clustering, labeling, time series 



^ A framework for cons tmctinq features and models for intrusion detection systems g 
^ Wenke Lee, Salvatore J. Stolfb 

^ November 2000 ACM Transactions on Information and System Security (TISSEC), 

Volume 3 Issue 4 

Publisher ACM Press 

Full text available: fB pdff 187 03 KB> Additional Information: full citation, abstract, references , dtinos. Index 

terms , review 

Intrusion detection (ID) is an important component of infrastructure protection 
mechanisms. Intrusion detection systems (IDSs) need to be accurate, adaptive, and 
extensible. Given these requirements and the complexities of today's network 
environments, we need a more systematic and automated IDS development process 
rather that the pure Icnowledge encoding and engineering approaches. This article 
describes a novel firameworic, MADAM ID, for Mining Audit Data for Automated Models for 
Instmsion ... 

Keywords: data mining, feature construction, intrusion detection 
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Md. Zahldul Islam, Ljiljana Brankovic 

January 2004 Proceedings of the second workshop on Australasian information 
security. Data Mining and Web Intelligence, and Software 
internationallsation - Volume 32 ACSW Frontiers *04 

Publisher: Australian Computer Society, Inc. 

Full text available: g pdff365.56 KB) Additional Information: full citation , abstract , references , index terms 

Nowadays organizations all over the world are dependent on mining gigantic datasets. 
Ttiese datasets typically contain delicate Individual tnfbrmatlon, which Inevitably gets 
exposed to different parties. Consequently privacy issues are constantly under the 
limelight and the public dissatisfaction may well threaten the exercise of data mining and 
all its benefits. It Is thus of great importance to develop adequate security techniques for 
protecting confidentiality of individual values used for dat 

Keywords: data mining, data security, noise addition, privacy, statistical database 



Multi Relational Data Mining (MRDMl: Scalability and efficiency in multi-relational g 
^ data mining 

^ Hendrlk Blockeel, Mlch^le Sebag 

July 2003 ACM SI6KDD Explorations Newsletter, Volume 5 issue i 
Publisher ACM Press 

Full text available: ^ pdf(1.61 MB) Additional Information: full citation , abstract , references , dtinas 

Efficiency and Scalability have always been important concerns in the field of data mining, 
and are even more so in the multi-relational context, which is inherently more complex. 
The Issue has been receiving an increasing amount of attention during the last few years, 
and quite a number of theoretical results, algorithms and implementations have been 
presented that explicitly aim at improving the efficiency and Scalability of multi-relational 
data mining approaches. With this article we attempt ... 

Data minino and aagreaation: Enhanced minino of association mies from data cubes ^ 
^ Riadh Ben Messaoud, Sabine Loudcher Rabas^da, Omar Boussaid, Rokia Missaoul 
~ November 2006 Proceedings of the 9th ACM international workshop on Data 
warehousing and OLAP DOLAP '06 

Publisher ACM Press 

Full text available: ^ Ddfr469.79 KB) Additional Infbmiation: full citation abstract , references , index terms 

On-line analytical processing (OLAP) provides tools to explore and navigate into data 
cubes in order to extract interesting information. Nevertheless, 01_AP is not capable of 
explaining relationships that could exist in a data cube. Association rules are one kind of 
data mining techniques which finds associations among data. In this paper, we propose a 
framework for mining Inter-dimenslonal association rules from data cubes according to a 
sum-based aggregais measure more general than simpi ... 

Keywords: OLAP, association rules, data cubes 



Research t rack papers: Mining, indexing, and auervina historical spatiotemporal data B 
Nikos Mamoulis, Huiping Cao, George Kollios, Marios Hadjieleftherlou, Yufei Tao, David W. 
Cheung 

August 2004 Proceedings of the tenth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '04 

Publisher: ACM Press 

Full text available Ddf(347 95 KB^ Additional Information: fall crtatiop. abstract, references, dtinos . index 
^ terms 
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In many applications that track and analyze spatlotemporal data, movements obey 
periodic patterns; the objects follow the same routes (approximately) over regular time 
Intervals. For example, people wake up at the same time and follow more or less the 
same route to their work everyday. The discovery of hidden periodic patterns in 
spatlotemporal data, apart from unveiling important Information to the data analyst, can 
facilitate data management substantially. Based on this observation, we propose ... 

Keywords: Indexing, pattern mining, spatlotemporal data, trajectories 



1^ Web Data Mining: Effective personaiization based on association rule discovery from Q 
^ web usage data 

^ Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa 

November 2001 Proceedings of the 3rd international woricshop on Web information 

and data management WXDM '01 
Publisher ACM Press 

Full text available: ' Ppdff521.65 KB^ ^^^^^ Information: full citati o n, abstract, reference. dCnqs. index 

terms 

To engage visitors to a Web site at a very eariy stage (i.e., before registration or 
authentication), personalization tools must rely primarily on cllckstream data captured In 
Web server logs. The lack of explicit user ratings as well as the sparse nature and the 
large volume of data in such a setting poses serious challenges to standard collaborative 
filtering techniques In terms of scalability and performance. Web usage mining techniques 
such as clustering that rely on offline pattern discover ... 

Keywords: association rules, oollaboFative filtering, personalization, web usage mining 



industrial and oovemment apolication s track posters: A component-based framework H 
for knowledge discoverv in bioinformatics 
Julien Etienne, Bemd Wachmann, Lei Zhang 

August 2006 Proceedings of the 12th ACM SIGKDD International conference on 
Knowledge discovery and data mining KDD '06 

Publisher ACM Press 

Full text available: ffi pdK994.29 KB) Additional Information: full citation, abstract, references, index tentts 

Motivation: In the field of bioinformatics there is an emerging need to integrate all 
knowledge discovery steps Into a standardized modular framework. Indeed, component- 
based development can significantly enhance reusability and productivity for short 
tfnneline projects with a small team. We present Interactive Knowledge Discovery and 
Data mining (iKDD), an application framework written in Java that was specifically 
designed for these purposes. Results: iKDD consists of a component-b ... 

Keywords: bioinformatics, data mining, workflow 



Data mining: A partial join approach for mining co-location patterns 

Jin Soung Yoo, Shashi Shekhar, John Smith, Julius P. Kumquat 

November 2004 Proceedings of the 12th annual ACM international workshop on 

Geographic information systems GIS '04 
Publisher ACM Press 

Full text available: ^ pdf(196.50 KB) Additional Information: full citation , abstract, references, index terms 

Spatial oo-locadon patterns represent the subsets of events whose instances are 
frequently located together in geographic space. We identified the computational 
bottleneck in the execution time of a current co-location mining algorithm. A large fractton 
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of the join-based co-location miner algorithm is devoted to computing joins to identify 
Instances of candidate co-location patterns. We propose a novel <l>part!al-joln</i> 
approach for mining co-location patterns efficiently. It trans ... 

Keywords: association rule, co-location, join, spatial data mining 



Multi Relational Data Mining (MRDM): State of the art of araph-based data mining B 
^ Takashi Washio, Hiroshi Motoda 

^ July 2003 ACM SIGKDD Explorations Newsletter, volume 5 issue 1 
Publisher ACM Press 

Full text available: ^pdf(1.20 MB) Additional Information: full citation, abstract , references . dtinQs 

The need for mining structured data has Increased in the past few years. One of the best 
studied data structures in computer science and discrete mathematics are graphs. It can 
therefore be no surprise that graph based data mining has become quite popular in the 
last few years.This article introduces the theoretical basis of graph based data fnining and 
surveys the state of the art of graph-based data mining. Brief descriptions of some 
representative approaches are provided as well. 

Keywords: data mining, graph, graph-based data mining, path, structured data, tree 



Monitoring data streams: A framework for diagnosing ctianaes in evolving data 
^ streams 
~ Charu C. Aggarwal 

June 2003 Proceedings of the 2003 ACM SIGMOD international conference on 
Management of data SIGMOD '03 

Publisher ACM Press 

Full text available ' Sipdf(312 62 KB) Additional Information: full citation , abstract, references , dting s. index 

terms 

In recent years, the progress in hardware technology has made it possible for 
organizations to store and record large streams of transactional data. This results in 
databases which grow without limit at a rapid rate. This data can often show important 
changes In trends over time. In such cases, it is useful to understand, visualize and 
diagnose the evolution of these trends. When the data streams are fast and continuous, it 
becomes important to analyze and predict the trends quickly In online fa ... 




Data mining, hypergraph transversals, and machine learning (extended abstract) 
^ Dimltrios Gunopulos, Heikki Mannila, Ron! Khardon, Hannu Toivonen 

^ May 1997 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on 
Principles of database systems PODS '97 

Publisher ACM Press 

Full text available: 'gpdf(1.50 I^B) Additional information: full crtation. references, dtinqs. index temis 



^ Revie\ved articles: An internet routing forensics framework for discovering rules of g 
^ abnomnal BGP events 

~ Jun Li, Dejing Dou, Zhen Wu, Shiwoong Kim, Vikash Agarwal 

October 2005 ACM SIGCOMM Computer Communication Review, Volume 35 issua 5 
Publisher ACM Press 

Full text available: ^ pdt(31 0.62 KB) Additional Intbrmation: full citation, abstract , references, index terms 

Abnormal BGP events such as attacks, misconfigu rations, electricity failures, can cause 
anomalous or pathological routing behavior at either global level or prefix level, and thus 
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must be detected In their early stages. Instead of using ad hoc methods to analyze BGP 
data. In this paper we Introduce an Internet Routing Forensics frameworlc to 
systematically process BGP routing data, discover rules of abnormal BGP events, and 
apply these rules to detect the occurrences of these events. In partlcula ... 

Keywords: abnormal BGP events, blackout, data mining, internet worms, routing 
forensics 
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